8 research outputs found

    Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

    Full text link
    We study the question of when we can provide logarithmic-time direct access to the k-th answer to a Conjunctive Query (CQ) with a specified ordering over the answers, following a preprocessing step that constructs a data structure in time quasilinear in the size of the database. Specifically, we embark on the challenge of identifying the tractable answer orderings that allow for ranked direct access with such complexity guarantees. We begin with lexicographic orderings and give a decidable characterization (under conventional complexity assumptions) of the class of tractable lexicographic orderings for every CQ without self-joins. We then continue to the more general orderings by the sum of attribute weights and show for it that ranked direct access is tractable only in trivial cases. Hence, to better understand the computational challenge at hand, we consider the more modest task of providing access to only a single answer (i.e., finding the answer at a given position) - a task that we refer to as the selection problem. We indeed achieve a quasilinear-time algorithm for a subset of the class of full CQs without self-joins, by adopting a solution of Frederickson and Johnson to the classic problem of selection over sorted matrices. We further prove that none of the other queries in this class admit such an algorithm.Comment: 17 page

    Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

    Get PDF
    We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. These optimality properties are derived for the widely used notion of data complexity, which treats query size as a constant. By performing a careful cost analysis, we are able to uncover a previously unknown tradeoff between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure

    Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries

    Full text link
    We study ranked enumeration for Conjunctive Queries (CQs) where the answers are ordered by a given ranking function (e.g., an ORDER BY clause in SQL). We develop "any-k" algorithms which, without knowing the number k of desired answers, push the ranking into joins and avoid materializing the join output earlier than necessary. For this to be possible, the ranking function needs to obey a certain kind of monotonicity; the supported ranking functions include the common sum-of-weights case where query answers are compared by sums of input weights, as well as any commutative selective dioid. One core insight of our work is that the problem is closely related to the fundamental task of path enumeration in a weighted DAG. We generalize and improve upon classic research on finding the k'th shortest path and unify into the same framework several solutions from different areas that had been studied in isolation. For the time to the k'th ranked CQ answer (for every value of k), our approach is optimal in data complexity precisely for the same class of queries where unranked enumeration is optimal -- and only slower by a logarithmic factor. In a more careful analysis of combined complexity, we uncover a previously unknown tradeoff between two different any-k algorithms: one has lower complexity when the number of returned results is small, the other when the number is very large. This tradeoff is eliminated under a stricter monotonicity property that we exploit to design a novel algorithm that asymptotically dominates all previously known alternatives, including the well-known algorithm of Eppstein for sum-of-weights path enumeration. We empirically demonstrate the findings of our theoretical analysis in an experimental study that highlights the superiority of our approach over the join-then-rank approach that existing database systems typically follow

    Efficient Computation of Quantiles over Joins

    No full text
    International audienceWe present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. A %JQ asks for the answer at a specified relative position (e.g., 50% for the median) under some ordering over the answers to a Join Query (JQ). Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. A recent dichotomy result rules out the existence of such an algorithm for a general family of queries and orders. Specifically, for acyclic JQs without self-joins, the problem becomes intractable for ordering by sum whenever we join more than two relations (and these joins are not trivial intersections). Moreover, even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ. In this work, we develop a new approach to solving %JQ and show how this approach allows not just to recover known results, but also generalize them and resolve open cases. Our solution uses two subroutines: The first one needs to select what we call a "pivot answer". The second subroutine partitions the space of query answers according to this pivot, and continues searching in one partition that is represented as new %JQ over a new database. For pivot selection, we develop an algorithm that works for a large class of ranking functions that are appropriately monotone. The second subroutine requires a customized construction for the specific ranking function at hand. We show the benefit and generality of our approach by using it to establish several new complexity results. First, we prove the tractability of min and max for all acyclic JQs, thereby resolving the above question. Second, we extend the previous %JQ dichotomy for sum to all partial sums (over all subsets of the attributes). Third, we handle the intractable cases of sum by devising a deterministic approximation scheme that applies to every acyclic JQ

    Fair Procedures for Fair Stable Marriage Outcomes

    No full text
    Given a two-sided market where each agent ranks those on the other side by preference, the stable marriage problem calls for finding a perfect matching such that no pair of agents prefer each other to their matches. Recent studies show that the number of stable solutions can be large in practice. Yet the classical solution to the problem, the Gale-Shapley (GS) algorithm, assigns an optimal match to each agent on one side, and a pessimal one to each on the other side; such a solution may fare well in terms of equity only in highly asymmetric markets. Finding a stable matching that minimizes the sex equality cost, an equity measure expressing the discrepancy of mean happiness among the two sides, is strongly NP-hard. Extant heuristics either (a) oblige some agents to involuntarily abandon their matches, or (b) bias the outcome in favor of some agents, or (c) need high-polynomial or unbounded time.We provide the first procedurally fair algorithms that output equitable stable marriages and are guaranteed to terminate in at most cubic time; the key to this breakthrough is the monitoring of a monotonic state function and the use of a selective criterion for accepting proposals. Our experiments with diverse simulated markets show that: (a) extant heuristics fail to yield high equity; (b) the best solution found by the GS algorithm can be very far from optimal equity; and (c) our procedures stand out in both efficiency and equity, even when compared to a non-procedurally fair approximation scheme
    corecore